Generating flexible proper name references in text: Data, models and evaluation

نویسندگان

  • Emiel Krahmer
  • Thiago Castro Ferreira
  • Sander Wubben
چکیده

This study introduces a statistical model able to generate variations of a proper name by taking into account the person to be mentioned, the discourse context and variation. The model relies on the REGnames corpus, a dataset with 53,102 proper name references to 1,000 people in different discourse contexts. We evaluate the versions of our model from the perspective of how human writers produce proper names, and also how human readers process them. The corpus1 and the model2 are publicly available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards proper name generation: a corpus analysis

We introduce a corpus for the study of proper name generation. The corpus consists of proper name references to people in webpages, extracted from the Wikilinks corpus. In our analyses, we aim to identify the different ways, in terms of length and form, in which a proper names are produced throughout a text.

متن کامل

Designing a Bank-Based Flexible Performance Evaluation System (Study: Bank Shahr)

 Given the limitations of the existing performance evaluation models for organizations with dynamic internal and external conditions, this study aims to provide a flexible performance evaluation model with adaptability to intra- and extra-organizational changes. The present study first forms a database of criteria related to banking activities. After gathering the experts' opinions, we select 2...

متن کامل

Introducing of Dirichlet process prior in the Nonparametric Bayesian models frame work

Statistical models are utilized to learn about the mechanism that the data are generating from it. Often it is assumed that the random variables y_i,i=1,…,n ,are samples from the probability distribution F which is belong to a parametric distributions class. However, in practice, a parametric model may be inappropriate to describe the data. In this settings, the parametric assumption could be r...

متن کامل

Identifying Unknown Proper Names In Newswire Text

The identification of unknown proper names in text is a significant challenge for NLP systems operating on unrestricted text. A system which indexes documents according to name references can be useful for information retrieval or as a preprocessor for more knowledge intensive tasks such as database extraction. This paper describes a system which uses text skimming techniques for deriving prope...

متن کامل

Generating proper name pro for automatic speech

Generating correct pronunciation of proper names remains one of the most difficult tasks in text-to-phoneme transcription. Although phonetic rules can be efficient in processing proper names of one language, foreign family names cannot be always correctly generated without additional pronunciation rules. The present study addresses the problem of pronunciation variants for French and foreign fa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017